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1.  SUMMARY  OF  COMPLETED  PROJECT 

The  research  funded  during  this  period  led  to  more  than  30  accepted  or  published  papers  in  refereed 
statistical  journals.  Several  graduate  students  were  involved  in  the  research  during  this  period  who  con¬ 
tributed  significantly  to  the  success  of  the  project  as  well  as  benefited  from  opportunities  for  educationa 
and  research  achievements  leading  to  the  Ph.D.  degree  (Sam  Hawala,  Kathryn  Prewitt  and  Kai-Sheng 
Song).  The  research  achievements  for  this  project  roughly  fall  into  five  areas  of  research,  all  of  whici 
are  in  nonparametric  statistics.  They  cover  a  wide  range  of  topics  from  the  applied  to  the  theoretical 
and  have  important  implications  for  data  analysis,  as  well  as  for  the  theory  of  Statistics.  Note  that  some 
papers  could  be  classified  into  several  of  these  categories;  the  numbers  in  the  following  refer  to  tne  list  oi 
published  papers  funded  by  this  grant,  given  in  Section  5.1  below. 

A.  Nonparametric  inference  for  incomplete  data  subject  to  censoring/truncation. 

These  research  works  have  applications  in  theoretical  and  applied  reliability.  ([12],  [14],  [15],  [16],  [17], 
[25],  [28]).  A  major  theoretical  result  was  for  instance  achieved  in  [15]. 

B.  Nonparametric  estimation  of  density  functions  and  failure  rates  for  incomplete  data. 

Important  applications  are  in  monitoring  and  quality  control  technology  ([5],  [11],  [13],  [24]).  Partic¬ 
ularly  noteworthy  is  [24],  producing  an  entirely  data-based  adaptive  smoothing  method  for  hazard  and 
density  functions  and  their  derivatives  under  random  censoring. 

C.  Nonparametric  analysis  of  discontinuities  in  hazard  functions,  regression  curves  and  multidimensional 

surfaces. 

Applications  of  this  research  are  in  statistical  image  analysis  and  nonparametric  change-point  methods 
([7],  [26],  [27],  [29]).  A  particularly  innovative  method  of  change-point  detection  in  the  context  of  smooth 
regression  functions  was  proposed  in  [7]. 

D.  Adaptive  estimation  of  spectra  and  peaks  in  spectra  for  stationary  processes,  and  of  regression  functions 

and  surfaces. 

Applications  are  in  signal  processing  and  nonparametric  curve  estimation.  ([1],  [9],  [19],  [22]).  A 
central  result  is  [9]. 

E.  Topics  m  nonparametric  regression,  especially  diagnostics  for  nonparametric  regression  and  variance 

function  modelling. 

Applications  are  in  regression  type  problems  ([2],  [3],  [4],  [6],  [8],  [10],  [18],  [20],  [21],  [23],  [30],  [31]). 
An  entirely  new  method  of  improved  one-  and  multidimensional  nonparametric  regression  is  for  instance 

[20]- 
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2.  COMPLETED  RESEARCH 

This  section  contains  more  detailed  descriptions  of  the  research  achievements  in  the  five  mam  areas 
A-E  of  research  which  were  funded  by  the  grant. 

2.1.  Nonparametric  inference  for  incomplete  data  subject  to  censoring/truncation  (Area 

*  A).  ([12],  [14],  [15],  [16],  [17],  [25],  [28]). 

Weak  and  strong  quantile  representations  for  randomly  truncated  data  with  applications.  Suppose  that 
we  observe  bivariate  data  (X»  V)  only  when  Y,  <  X,  (left  truncation).  Denote  with  F  the  marginal 
d.f.  of  the  X's.  In  this  paper  we  derive  a  Bahadur-type  representation  for  the  quantile  function  of 
the  pertaining  product-limit  estimator  of  F.  As  an  application  we  obtain  confidence  intervals  and 
bands  for  quantiles  of  F. 

On  the  Hajek  projection  for  truncated  and  censored  data.  Large  sample  properties  of  the  product-limit 
estimators  for  truncated  or  censored  data  are  usually  achieved  via  the  empirical  cumulative  hazard 
function  estimators.  Hajek  projection  of  the  empirical  cumulative  hazard  function  estimator  is 
derived  for  truncated  data  and  expressed  for  censored  data.  It  turns  out  that  both  projections  are 
asymptotically  n^-equivalent  but  not  equal  to  the  respective  influence  curves.  Weak  convergencies 
of  the  empirical  cumulative  hazard  processes  are  deduced  accordingly. 

A  strong  law  under  random  censorship.  Let  X\,Xi, ...  be  a  sequence  of  i.i.d.  random  variables  with 
d.f.  F.  We  observe  Z<  =  min(At,y)  and  $,•  =  l{x,<yt},  where  is  a  sequence  of  i.i.d. 

censoring  random  variables.  Denote  with  Fn  the  Kaplan-Meier  estimator  of  F.  We  show  that  for 
any  F-integrable  function  (p,  f  g>dFn  converges  almost  surely  and  in  the  mean.  The  result  may  be 
applied  to  yield  consistency  of  many  estimators  under  random  censorship. 

Strong  representations  of  the  survival  function  estimator  for  truncated  and  censored  data  with  appli¬ 
cations.  A  strong  i.i.d.  representation  is  obtained  for  the  product-limit  estimator  of  the  survival 
function  based  on  left  truncated  and  right  censored  data.This  extends  the  result  of  Chao  and  Lo 
(1988,  Ann.  Statist.  16,  661-668)  for  truncated  data.  An  improved  rate  of  the  approximation  is  also 
obtained  on  compact  sets.  Applications  include  density  and  hazard  rate  estimation.  Ihe  advantage 
of  the  improved  rate  of  the  approximation  is  illustrated  via  kernel  density  estimation. 

Multi-Sample  U-Statistics  for  censored  data.  In  this  paper  we  prove  almost  sure  convergence  of  multi¬ 
sample  U-statistics  under  random  censorship.  As  an  application  we  obtain  consistency  of  a  new 
class  of  tests  designed  for  testing  about  equality  in  distribution. 

The  jackknife  estimate  of  a  Kaplan-Meier  integral.  We  derive  an  explicit  formula  for  the  jackknife  esti¬ 
mate  of  a  Kaplan-Meier  integral.  From  this  the  asumptotic  analysis  of  the  jackknifed  Kaplan-Meier 
process  becomes  straightforward.  In  a  small  simulation  study  it  is  demonstrated  that  jackknifing 
may  lead  to  a  considerable  reduction  of  the  bias. 

M-estimators  for  censored  data:  Strong  consistency.  Let  Fn(x)  denote  the  Kaplan-Meier  product-limit 
estimate  for  the  life  distribution  function  F(x;0o )  based  on  randomly  censored  data.  The  M- 
estimator  of  60  corresponding  to  a  function  p  is  defined  to  be  the  value  of  6  which  minimizes 
f  p(x;ff)dFn(x).  The  strong  consistency  of  M-estimators  is  studied.  It  is  shown  that  most  of  the 
classical  sufficient  conditions  based  on  p,  such  as  Wald  (1949),  Kiefer  and  Wolfowitz  (1956),  Huber 
(1967)  can  be  extended  to  randomly  censored  data.  Two  such  extensions  based  on  Perlman  (1972) 
and  Wang  (1985)  are  illustrated  in  detail  and  applied  to  parametric,  semi-  and  non-parametric 
classes. 
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2.2.  Nonparametric  estimation  of  density  functions  and  failure  rates  for  incomplete  data 

(Area  B).  ([5],  [11],  [13],  [24]) 

A  comparison  of  adaptive  hazard  rate  estimators  for  left  truncated  and  right  censored  data.  Left  trunca¬ 
tion  and  right  censoring  arise  frequently  in  practice  for  life  data.  This  paper  is  concerned  with  the 
estimation  of  the  hazard  rate  function  for  such  data.  Two  types  of  nonparametric  estimators  based 
on  kernel  smoothing  methods  are  considered.  The  first  one  is  obtained  by  convolving  a  kernel  with 
a  cumulative  hazard  estimator.  The  second  one  is  in  the  form  of  a  ratio  of  two  statistics.  Local 
properties  including  consistency,  asymptotic  normality  and  mean  squared  error  expressions  are  pre¬ 
sented  for  both  estimators.  These  properties  facilitate  locally  adaptive  bandwidth  choice.  The  two 
types  of  estimators  are  then  compared  based  on  their  theoretical  and  empirical  performances.  The 
effect  of  overlooking  the  truncation  factor  is  demonstrated  through  the  Channing  House  data. 

Nonparametric  maximum  likelihood  estimation  of  an  increasing  hazard  rate  for  uncertain  cause-of-death 
data.  In  Kaplan-Meier  estimation  of  the  survival  function  for  diseased  animals  the  cause  of  death 
has  be  specified  with  certainty.  When  pathologists  are  unable  to  do  so  forced  cause-of-death  data 
can  create  substantial  biases.  For  unidentifiable  cause-of-death  data  we  derive  the  nonparametric 
maximum  likelihood  estimator  of  the  hazard  rate  due  to  the  disease  assuming  it  is  increasing  when 
the  survival  function  without  the  disease  is  known  or  can  be  well-estimated.  Strong  consistency  of 
the  maximum  likelihood  estimator  is  also  obtained.  Such  model  arises  in  Carcinogenesis  bioassy. 
For  example,  consider  an  experiment  in  which  laboratory  animals  are  followed  and  examined  on 
death  for  the  existence  of  a  particular  type  of  tumor  and  to  determine  if  this  was  the  cause  of  death. 
When  diagnosis  can  be  given  with  certainty  the  usual  life-table  or  Kaplan-Meier  (1958)  approach 
is  applicable  and  simplifies  the  statistical  analysis.  However,  it  often  happened  (Dinse  (1986))  that 
the  pathologists  cannot  assert  the  cause  of  death  and  forcing  the  cause  of  death  to  be  specified 
may  result  in  substantial  bias  (Racine-Poon  and  Hoel  (1984)).  Our  approach  does  not  require  the 
knowledge  of  the  cause  of  death  and  assumes  instead  that  the  survival  function  of  death  due  to  all 
other  competing  risks  can  be  determined  quite  accurately  from  other  sources.  Such  a  model  arises, 
for  example,  in  engineering  context  when  a  new  device  is  added  to  an  existing  system  whose  survival 
function  is  well  understood  and  assumed  to  be  known,  and  it  is  not  possible  or  too  costly  to  specify 
whether  a  failure  of  the  new  system  is  due  to  the  new  device  or  not.  If  the  life  distribution  of  the 
risk  of  interest  has  increasing  hazard  rate,  the  nonparametric  maximum  likelihood  estimator  of  the 
hazard  rates  reduces  to  the  solution  of  a  nonstandard  optimization  problem.  A  solution  is  given 
in  this  paper  in  the  form  of  the  max-min  formula  together  with  a  computation  algorithm  based  on 
the  maximum  upper  sets.  Nonparametric  maximum  likelihood  estimators  for  the  distribution  <md 
density  functions  are  then  implied  by  the  invariance  principle.  The  proof  of  the  strong  consistency 
utilizes  the  total  time  on  test  transformation  in  a  nonstandard  way  and  generalized  a  useful  lemma 
in  isotonic  procedures  due  to  Marshall  (1970). 

Nonparametric  estimation  of  hazard  functions  and  their  derivatives  under  truncation  model.  Nonpara¬ 
metric  kernel  estimators  for  hazard  functions  and  their  derivatives  are  considered  under  the  random 
left  truncation  model.  The  estimator  is  of  the  form  of  sum  of  identically  distributed  but  dependent 
random  variables.  Exact  and  asymptotic  expressions  for  the  biases  and  variances  of  the  estimators 
are  derived.  Mean  square  consistency  and  local  asymptotic  normality  of  the  estimators  are  estab¬ 
lished.  Adaptive  local  bandwidths  are  obtained  by  estimating  the  optimal  bandwidths  consistently. 

Hazard  rate  estimation  under  random  censoring  with  varying  kernels  and  bandwidths.  We  discuss  the 
estimation  of  hazard  rates  under  random  censoring  with  the  kernel  method.  Two  practically  relevant 
problems  which  occur  when  applying  unmodified  kernel  estimators  are  boundary  effects  near  the 
endpoints  of  the  support  of  the  hazard  rate,  and  a  substantial  increase  in  the  variance  from  left  to 
right  over  the  range  of  abscissae  where  the  hazard  rate  is  estimated.  A  new  class  of  boundary  kernels 
is°proposed  for  the  first  problem.  Explicit  formulas  for  these  kernels  are  developed,  and  it  is  shown 


that  this  boundarv  correction  works  well  in  practice.  A  data-adaptive  varying  bandwidth  selection 
procedure  is  proposed  for  the  second  problem.  -This  procedure  generally  will  lead  to  increasing 
bandwidths  near  the  left  endpoint  and  towards  the  right  endpoint  and  will  lead  to  smaller  integrated 
mean  squared  error  of  the  hazard  rate  estimator  as  compared  to  a  fixed  bandwidth  method.  A 
practically  feasible  method  incorporating  the  new  boundary  kernels  and  local  bandwidth  choices  is 
implemented  and  illustrated  with  survival  data  from  a  leukemia  study. 

2.3.  Nonparametric  analysis  of  discontinuities  in  hazard  functions,  regression  curves  and 
multidimensional  surfaces.  (Area  C).  ([7],  [26],  [27],  [29]) 

Change-points  in  nonparametric  regression  analysis.  Estimators  for  location  and  size  of  a  discontinuity 
or  change-point  in  an  otherwise  smooth  regression  model  are  proposed.  The  assumptions  needed 
are  much  weaker  than  those  made  in  parametric  models.  The  proposed  estimators  apply  as  well 
to  the  detection  of  discontinuities  in  derivatives  and  therefore  to  the  detection  of  change-points  of 
slope  and  of  higher  order  curvature.  The  proposed  estimators  are  based  on  a  comparison  of  left  and 
right  one-sided  kernel  smoothers.  Weak  convergence  of  a  stochastic  process  in  local  differences  to  a 
Gaussian  process  is  established  for  properly  scaled  versions  of  estimators  of  the  location  of  a  change- 
point.  The  continuous  mapping  theorem  can  then  be  invoked  to  obtain  asymptotic  distributions 
and  corresponding  rates  of  convergence  for  change-point  estimators.  These  rates  are  typically  faster 
than  7i-1/2.  Rates  of  global  IP  convergence  of  curve  estimates  with  appropriate  kernel  modifications 
adapting  to  estimated  change-points  are  derived  as  a  consequence.  It  is  shown  that  these  rates  of 
convergence  are  the  same  as  if  the  location  of  the  change-point  was  known.  The  methods  are 
illustrated  by  means  of  the  well-known  data  on  the  annual  flow  volume  of  the  Nile  River  between 
1871-1970. 

Maximin  estimation  of  multivariate  boundaries.  We  consider  the  problem  of  estimating  the  location 
and  size  of  a  discontinuity  in  an  otherwise  smoooth  multidimensional  regression  function.  The 
boundary  or  location  of  the  discontinuity  is  assumed  to  be  a  closed  curve  respective  surface,  and 
we  aim  to  estimate  this  closed  set.  Our  approach  utilizes  the  uniform  convergence  of  multivariate 
kernel  estimators  for  directional  limits.  Differences  of  such  limits  converge  to  zero  under  smoothness 
assumptions,  and  to  the  jump  size  along  the  discontinuity.  This  leads  to  the  proposal  of  a  maximim 
estimator,  which  selects  the  boundary  for  which  the  minimal  estimated  directional  difference  among 
all  points  belonging  to  this  boundary  is  maximized.  It  is  shown  that  this  estimated  boundary  is 
almost  surely  enclosed  ina  sequence  of  shrinking  neighborhoods  around  the  true  boundary,  and 
corresponding  rates  of  convergence  are  obtained. 

Change-point  models  for  hazard  functions.  A  review  is  presented  of  parametric  and  nonparametric 
models  and  corresponding  estimation  procedures  for  change-points  in  hazard  functions  where  the 
data  are  possibly  subject  to  random  censoring.  In  particular,  we  discuss  nonparametric  models  and 
the  application  of  nonparametric  smoothing  techniques  for  change-point  estimation  and  estimation 
of  a  hazard  function  when  a  change-point  is  present.  Preliminary  theoretical  results  are  mentioned 
and  a  simulation  study  provides  further  insight. 

Cube  Splitting  in  Multidimensional  Edge  Estimation.  Assume  noisy  measurements  are  available  and 
that  an  edge  or  boundary  is  given  which  induces  a  partition  of  the  domain  into  two  subsets.  The 
regression  function  on  one  subset  is  equal  to  a  constant  c\ ,  on  the  other  subset  to  a  constant  c 2-  Each 
measurement  is  made  within  a  regular  pixel.  The  problem  we  consider  is  the  estimation  of  the  edge  or 
boundary  curve  (change  curve),  for  the  case  that  the  domain  is  in  We  propose  to  seek  boundary 
estimates  as  maximizers  of  a  weighted  squared  difference  statistic  where  we  maximize  over  unions 
of  cubes  of  aggregated  pixels.  Rates  of  almost  sure  convergence  of  this  procedure  are  established. 
Its  central  advantage  is  its  numerical  feasibility,  as  the  number  of  cubes  of  aggregated  pixels  to  be 
investigated  for  inclusion  in  one  of  the  partitioning  sets  can  be  kept  small.  A  numerically  efficient 
’’cube  splitting”  (’’CUSP”)  algorithm  is  suggested  which  implements  this  proposal:  Start  with  an 
iteratively  grown  union  of  big  cubes  of  aggregated  pixels  to  find  a  first  approximate  edge/boundary 
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estimate  on  a  coarse  level  of  approximation.  Then  split  those  cubes  falling  near  the  boundary  into 
smaller  cubes  and  check  their  allocation  to  one  of  the  partitioning  sets  in  order  to  obtain  a  more 
refined  boundary  estimate.  This  cube  splitting  (refinement)  step  may  then  be  iterated  until  the 
desired  level  of  resolution  is  achieved. 

2.4.  Adaptive  estimation  of  spectra  and  peaks  in  spectra  for  stationary  processes,  and  of 
regression  functions  and  surfaces.  (Area  D).  ([1],  [9],  [19],  [22]) 

Applications  of  multiparameter  weak  convergence  for  adaptive  nonparametric  curve  estimation.  We 
give  an  overview  on  applications  of  weak  convergence  of  stochastic  processes  to  obtain  adaptive 
nonparametric  curve  estimators  through  efficient  data-based  local  bandwidth  choices.  We  point  out 
new  developments  based  on  multivariate  time  stochastic  processes.  Examples  are  multivariate  curve 
estimates,  where  several  local  bandwidths  are  to  be  chosen  for  different  coordinates,  and  estimates 
of  local  functionals  of  curves  which  can  be  expressed  as  maxima  or  zeros  of  local  deviation  processes 
and  also  depend  on  a  bandwidth.  As  an  illustration,  we  show  that  adaptive  mode  estimation  for  a 
probability  density  function  is  a  consequence  of  weak  convergence  of  a  two-dimensional  process  in 
a  bandwidth  and  a  deviation  coordinate.  Various  adaptive  mode  estimators  are  discussed. 

Weak  convergence  and  adaptive  peak  estimation  for  spectral  densities.  Adaptive  nonparametric  kernel 
estimators  for  the  location  of  a  peak  of  the  spectral  density  of  a  stationary  time  series  are  pro¬ 
posed  and  investigated.  They  are  based  on  direct  smoothing  of  the  periodogram  where  the  amount 
of  smoothing  is  determined  automatically  in  an  asymptotically  optimal  fashion.  These  adaptive 
estimators  minimize  the  asymptotic  mean  squared  error.  Adaptivity  is  derived  from  the  weak  con¬ 
vergence  of  a  two-parameter  stochastic  process  in  a  deviation  and  a  bandwidth  coordinate  to  a 
Gaussian  limit  process.  Efficient  global  and  local  bandwidth  choices  which  lead  to  adaptive  peak 
estimators  and  practical  aspects  are  discussed. 

Comment  to  “ Local  regression:  automatic  kernel  carpentry”.  A  discussion  of  a  paper  on  weighted  local 
least  squares  methods  and  comparisons  with  kernel  methods. 

Multiparameter  bandwidth  processes  and  adaptive  surface  smoothing.  We  derive  a  functional  limit  theo¬ 
rem  for  a  sequence  of  bandwidth  processes  with  multivariate  time  and  show  that  the  limit  process 
is  multivariate  Gaussian.  This  theorem  is  then  applied  to  show  asymptotic  efficiency  of  certain 
data-adaptive  local  bandwidth  choices  for  kernel  estimators  of  multivariate  regression  functions  and 
their  derivatives.  The  cases  where  optimal  multivariate  bandwidths  exist  as  minimizers  of  leading 
mean  squared  error  terms  are  characterized. 

2.5.  Methods  for  nonparametric  regression,  especially  diagnostics  for  nonparametric  re¬ 
gression  and  variance  function  modelling.  (Area  E).  ([2],  [3],  [4],  [6],  [8],  [10],  [18],  [20],  [21], 
[23],  [30],  [31]) 

Discussion  of  “Transformations  in  density  estimation” .  A  discussion  of  how  transformations  in  density 
estimation  affect  estimation  near  boundaries. 

Smooth  optimum  kernel  estimators  near  endpoints.  Kernel  estimators  for  smooth  curves  like  density, 
spectral  density  or  regression  functions  with  known  compact  support  require  modifications  when 
estimating  near  the  endpoints  of  the  support,  both  for  practical  and  asymptotic  reasons.  The  con¬ 
struction  of  such  boundary  kernels  as  solutions  of  a  variational  problem  is  addressed  and  expansions 
in  terms  of  orthogonal  polynomials  are  given,  including  explicit  solutions  for  the  most  important 
cases.  Based  on  explicit  formulas  for  certain  functionals  of  the  kernels,  it  is  shown  that  local  band¬ 
width  variation  might  be  indicated  near  boundaries.  Various  bandwidth  variation  schemes  and  the 
impact  of  boundary  modifications  on  cross-validation  bandwidths  are  investigated  in  a  Monte  Carlo 

study. 


Optimizing  kernel  methods:  A  unifying  variational  principle.  We  consider  a  variety  of  optimization 
problems  connected  with  the  choice  of  a  kernel  function.  An  example  is  the  optimzation  of  kernels 
for  estimating  characteristic  points  of  a  curve  which  are  the  locations  of  extrema  of  higher  order 
derivatives.  We  discuss  the  problems  of  finding  ‘optimal’  kernels  minimizing  the  asymptotic  mean 
squared  error  in  this  context  and  that  of  ‘minimum  variance’  kernels  minimizing  the  asymptotic 
variance.  The  corresponding  variational  problems  are  analyzed  by  means  of  Jacobi  representations 
and  explicit  solutions  which  are  polynomials  with  compact  support  are  obtained.  It  is  then  shown 
that  in  fact  a  variety  of  other  variational  problems  connected  with  the  choice  of  optimal  kernel 
functions  are  equivalent  to  this  problem.  A  general  underlying  variational  principle  is  uncovered 
and  investigated.  The  limiting  case  as  the  order  of  smoothness  of  the  kernel  tends  to  infinity  is 
studies,  leading  to  analytic  kernel  functions  on  »  for  which  an  explicit  Hermite  representation  is 
found.  The  kernels  thus  obtained  provide  a  natural  extension  of  the  optimal  polynomial  kernels 
with  compact  support. 

Goodness-of-fit  diagnostics  for  regression  models.  For  a  fixed  design  regression  model,  we  compare  the 
fitting  of  parametric  linear  or  nonlinear  models  by  the  least  squares  method  with  a  model-free 
nonparametric  approach  via  kernel  estimates.  One  major  problem  for  such  a  comparison  is  the 
necessary  bandwidth  choice  for  the  nonparametric  estimate,  and  a  data-adaptive  method  for  local 
bandwidth  choice  based  on  the  parametric  fit  is  proposed.  As  an  application  we  consider  comparison 
of  both  estimates  and  of  corresponding  estimates  of  derivatives  at  a  finite  number  of  preselected 
points.  This  leads  to  a  test  statistic  which  is  asymptotically  x2  distributed  under  the  null  hypothesis 
that  the  parametric  model  contains  the  underlying  regression  function  g  and  has  asymptotic  power 
1  under  certain  contiguous  alternatives.  As  compared  to  other  proposed  goodness-of-fit  procedures, 
this  test  does  not  depend  on  the  subjective  choice  of  a  bandwidth.  Practical  issues  and  diagnostic 
plots  are  illustrated  in  a  data  application. 

Ultraspherical  polynomial,  kernel  and  hybrid  estimators  for  nonparametric  regression.  We  discuss  or¬ 
thogonal  polynomial  series  estimators  for  a  regression  function  and  its  derivatives  using  weighted 
(and  in  particular  Ultraspherical)  polynomials  as  the  orthonormal  basis.  We  consider  also  a  local¬ 
ized  version  of  this  method  which  is  a  hybrid  between  kernel  and  orthogonal  polynomial  approaches 
and  can  be  shown  to  be  a  generalization  of  both.  Our  analysis  follows  from  the  observation  that  the 
orthogonal  polynomial  estimator  has  an  equivalent  kernel  interpretation,  where  the  bandwidth  is 
fixed  and  the  number  of  basis  elements  included  in  the  orthogonal  polynomial  estimator  corresponds 
to  the  order  of  the  equivalent  kernel  function.  This  equivalence  also  yields  a  new  construction  for 
boundary  kernels.  The  hybrid  estimators  are  more  flexible  than  both  kernel  and  orthogonal  polyno¬ 
mial  estimators.  In  a  study  comparing  finite  integrated  mean  squared  errors  for  various  cases,  the 
behaviour  of  the  hybrid  estimator  compared  to  the  orthogonal  polynomial  estimators  were  assessed 
and  its  superiority  for  some  cases  is  indicated. 

Preaveraged  localized  orthogonal  polynomial  estimators  for  surface  smoothing  and  partial  differentiation. 
We  propose  a  multivariate  smoothing  method  based  on  products  of  localized  orthogonal  polynomial 
series  estimators  for  a  smooth  regression  surface  in  the  fixed  design  regression  model.  The  estima¬ 
tion  of  partial  derivatives  is  included.  The  proposed  method  provides  for  automatic  and  efficient 
boundary  modifications  near  the  edges  of  the  surface,  assuming  that  the  boundary  of  the  support  of 
the  regression  function  satisfies  some  regularity  conditions.  By  allowing  for  a  preaveraging  step,  the 
corresponding  algorithms  are  speeded  up  considerably  and  are  easy  to  implement.  Computation  of 
special  boundary  kernels,  as  required  by  the  kernel  method  in  order  to  avoid  edge  effects,  is  not  nec¬ 
essary.  It  is  shown  that  under  sufficient  smoothness  assumptions,  the  global  average  mean  squared 
error  has  the  same  optimal  rate  of  convergence  as  the  mean  squared  error  at  an  interior  point,  i.e., 
that  the  boundary  correction  is  asymptotically  effective.  The  method  depends  on  two  smoothing 
parameters,  one  determining  the  amount  of  preaveraging,  the  other  the  amount  of  smoothing  after 
preaveraging.  Theoretical  and  practical  bounds  for  the  choice  of  these  parameters  are  discussed.  A 
Monte  Carlo  study  based  on  a  bivariate  normal  surface  indicates  that  increasing  the  preaveraging 
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parameter  8  has  a  negative  effect  on  the  average  mean  squared  error,  which  is  not  unexpected.  On 
the  other  hand,  larger  values  of  6  are  computationally  more  economical.  The  effects  of  boundary 
correction  as  compared  to  non-corrected  estimates  are  investigated  for  the  example  of  a  quadratic 
surface  The  numerical  complexity  of  the  proposed  method  is  discussed.  The  methods  are  demon¬ 
strated  and  compared  to  kriging  for  two  data  sets,  one  on  non-uniformly  measured  ground-water 
levels  in  Arizona,  the  over  on  cover-clay  thickness  data  from  Iran  measured  on  a  regular  mesh.  The 
two  data  analyses  include  regular  and  irregular  designs  and  supports  and  seem  to  indicate  that  the 
method  works  well  in  particular  when  compared  to  kriging. 

On  variance  function  estimation  with  quadratic  forms.  We  propose  a  class  of  general  quadratic  forms  m 
the  dependent  variable  in  order  to  estimate  the  variance  function  in  a  non-parametnc  heteroscedas- 
tic  fixed  design  regression  model.  It  is  shown  that  these  estimators  achieve  improved  rates  of 
convergence  for  the  mean  squared  error  as  compared  to  estimators  that  were  considered  before. 
Besides  results  on  consistencv  and  rates  of  convergence,  also  the  leading  terms  of  the  asymptotic 
mean  squared  error  are  obtained  for  some  important  cases.  Several  interesting  special  estimators 

are  discussed  in  more  detail. 

Identity  reproducing  multivariate  nonparametric  regression.  Nonparametric  kernel  regression  estimators 
of  the  Nadaraya- Watson  type  are  known  to  have  an  undesirable  bias  behavior.  We  propose  a  genera^ 
technique  to  improve  the  bias  of  any  given  multivariate  non-parametric  regression  estimator  based 
on  the  requirement  that  the  identity  function  should  be  reproduced,  which  is  achieved  by  means 
of  an  identity  reproducing  transformation  of  the  predictor  variable.  The  asymptotic  distribution  of 
the  identity  reproducing  version  of  the  Nadaraya- Watson  estimator  is  derived  and  is  compared  with 
that  of  the  untransformed  Nadaraya- Watson  estimator.  It  is  demonstrated  by  means  of  a  Monte 
Carlo  study  that  the  asymptotic  improvements  are  noticeable  already  for  small  sample  sizes. 

On  boundary  kernel  method  for  nonparametric  curve  estimation  near  endpoints.  Kernel  estimators  for 
nonparametric  function  estimation  are  affected  by  boundary  effects  when  estimating  near  an  end¬ 
point  of  the  support  of  the  function.  A  general  construction  for  boundary  kernels  is  presented, 
which  allows  to  remove  these  edge  effects.  It  is  shown  that  common  kernel  functions  which  satisfy 
some  mild  requirements  can  be  derived  as  the  solution  of  a  variational  problem  involving  a  certain 
weight  function.  For  the  solutions  of  this  variational  problem,  an  explicit  representation  in  poly¬ 
nomials  which  are  orthogonal  with  respect  to  this  “associated  weight  function”  is  found;  thus  any 
common  kernel  function  can  be  represented  as  a  product  of  an  “associated  weight  function”  and  an 
orthogonal  expansion.  It  is  demonstrated  how  this  variational  problem  and  its  solution  can  be  ex¬ 
tended  to  cover  boundary  kernels.  The  resulting  explicit  construction  of  boundary  kernels  includes 
kernels  with  compact  as  well  as  noncompact  support,  and  examples  are  presented  demonstrating 
the  corresponding  boundary  kernels  for  compactly  supported  polynomial  kernels,  Gaussian  kernels 
with  unbounded  support,  and  related  analytical  kernel  functions. 

Asymptotics  for  nonparametric  regression.  This  paper  reviews  and  compares  some  of  the  basic  ideas 
in  nonparametric  regression  with  Mahalanobis’  Fractile  Graphical  Analysis.  New  results  concern  a 
new  proof  of  known  properties  of  the  “error  areas”  in  Fractile  Graphical  Analysis  using  concomi¬ 
tants  and  a  discussion  of  the  properties  of  fractile  graphs  as  a  method  of  nonparametric  regression 
(Fractile  Regression).  Furthermore,  a  general  result  for  the  local  asymptotic  distribution  of  rea  - 
valued  functions  with  arguments  which  are  functionals  formed  by  weighted  averages  of  the  data  is 
provided.  This  result  is  applied  to  derive  local  distributions  for  various  nonparametric  regression 
type  estimates,  including  estimators  for  derivatives. 

Orthogonal  polynomial  and  hybrid  estimators  for  nonparametric  regression.  This  algorithm  calculates 
a  nonparametric  regression  curve  and  derivative  estimate  based  on  a  hybrid  formulation  between 
kernel  and  orthogonal  polynomial  methods  (Azari,  Mack,  Muller,  1992)  The  estimators  depend 
on  two  smoothing  parameters  which  have  to  be  provided  by  the  user  and  contain  both  kernel  and 
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orthogonal  polynomial  estimators  as  special  cases,  as  well  as  an  asymptotic  equivalent  of  polynomial 
regression.  Automatic  boundary  corrections  are  Included. 

Estimating  direction  fields  in  autonomous  equation  models,  with  an  application  to  system  identification 
from  cross-sectional  data.  We  consider  a  situation  where  ‘cross-sectional  observations  follow  an  un¬ 
derlying  ‘longitudinal’  model  with  a  population  mean  function.  Measurements  of  the  mean  function 
values  without  noise  and  of  derivatives  with  noise  are  available,  which  are  made  at  unknown  “obser¬ 
vation  times”.  The  population  mean  function  lies  in  a  nonparametric  smoothness  class,  and  while 
not  fully  identifiable,  is  a  trajectory  in  the  direction  field  of  an  autonomous  ordinary  differential 
equation.  An  efficient  reconstruction  of  that  field  is  proposed  which  proceeds  from  nonparametric 
estimation  of  the  function  driving  the  differential  equation.  Rates  of  convergence  are  investigated.  A 
simulation  is  included  showing  the  recovery  of  the  mean  function.  Signal  and  noise  are  chosen  to  be 
typical  for  T4-cell  counts  in  a  small  cohort  of  undated  HIV-seroconverters,  and  the  reconstruction 
of  the  mean  function  is  seen  to  be  satisfactory. 
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